Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Dec 18, 2025

Tracking: #61041

This significantly improves both utf8 TextDecoder and Buffer#toString() performance on ASCII

And removes the main reason why import { TextDecoder } from '@exodus/bytes/encoding.js' beats both Node.js TextDecoder and Node.js Buffer on Node.js on utf-8 🙃

See #61041 (comment)

Using https://github.com/lemire/jstextdecoderbench by @lemire:

Buffer#toString() - utf8

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.60 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.294 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.212 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.74 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.280 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.199 ms

TextDecoder, loose

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.90 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.286 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.200 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.61 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.279 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.208 ms

TextDecoder, fatal

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 15.17 GiB/s 0.006 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.292 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.206 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 36.57 GiB/s 0.002 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.286 ms
Chinese lipsum 68.203 KiB 0.31 GiB/s 0.207 ms

cc @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 18, 2025
@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch 4 times, most recently from 0bfc0ec to 4f82c5a Compare December 18, 2025 21:54
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (4f24aff) to head (42b4ff4).
⚠️ Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61119      +/-   ##
==========================================
- Coverage   88.53%   88.53%   -0.01%     
==========================================
  Files         703      703              
  Lines      208546   208555       +9     
  Branches    40217    40215       -2     
==========================================
+ Hits       184634   184638       +4     
+ Misses      15926    15923       -3     
- Partials     7986     7994       +8     
Files with missing lines Coverage Δ
src/string_bytes.cc 69.74% <100.00%> (-0.77%) ⬇️
src/encoding_binding.cc 55.33% <92.30%> (+0.72%) ⬆️

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It is simple, likely quite safe.

@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch from 4f82c5a to 866781f Compare December 19, 2025 05:11
@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

Upd: removed useless reinterpret_cast, otherwise unchanged

@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

@lemire I notice that !simdutf::validate_ascii_with_errors().error is consistently faster than simdutf::validate_ascii()
(36 GiB/s vs 34-35 GiB/s on ASCII-utf8-to-text conversion)

Why is that and is there a reason to prefer the latter?
Code around seems to be using the former, should I just switch to validate_ascii_with_errors?

Upd, ah: #46271 (comment), updated PR

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@github-actions github-actions bot added request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Dec 19, 2025
@github-actions
Copy link
Contributor

Failed to start CI
   ⚠  Commits were pushed since the last approving review:
   ⚠  - src: improve StringBytes::Encode perf on ASCII
   ⚠  - src: use validate_ascii_with_errors
   ✘  Refusing to run CI on potentially unsafe PR
https://github.com/nodejs/node/actions/runs/20370786084

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
@nodejs-github-bot
Copy link
Collaborator

@RafaelGSS RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Dec 19, 2025
Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nodejs-github-bot
Copy link
Collaborator

@panva panva added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed. labels Dec 20, 2025
@meixg meixg added commit-queue Add this label to land a pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Dec 25, 2025
@aduh95 aduh95 added the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 26, 2025
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Dec 26, 2025
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/61119
✔  Done loading data for nodejs/node/pull/61119
----------------------------------- PR info ------------------------------------
Title      src: improve StringBytes::Encode perf on ASCII (#61119)
Author     Nikita Skovoroda <chalkerx@gmail.com> (@ChALkeR)
Branch     ChALkeR:chalker/decoder/ascii/0 -> nodejs:main
Labels     buffer, c++, performance, author ready, needs-ci
Commits    2
 - src: improve StringBytes::Encode perf on ASCII
 - src: use validate_ascii_with_errors
Committers 1
 - Nikita Skovoroda <chalkerx@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/61119
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Gürgün Dayıoğlu <hey@gurgun.day>
Reviewed-By: Xuguang Mei <meixuguang@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/61119
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Gürgün Dayıoğlu <hey@gurgun.day>
Reviewed-By: Xuguang Mei <meixuguang@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
--------------------------------------------------------------------------------
   ℹ  This PR was created on Thu, 18 Dec 2025 21:39:48 GMT
   ✔  Approvals: 8
   ✔  - Daniel Lemire (@lemire): https://github.com/nodejs/node/pull/61119#pullrequestreview-3596016107
   ✔  - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/61119#pullrequestreview-3599284734
   ✔  - Colin Ihrig (@cjihrig): https://github.com/nodejs/node/pull/61119#pullrequestreview-3599469699
   ✔  - Rafael Gonzaga (@RafaelGSS) (TSC): https://github.com/nodejs/node/pull/61119#pullrequestreview-3599517939
   ✔  - Gürgün Dayıoğlu (@gurgunday): https://github.com/nodejs/node/pull/61119#pullrequestreview-3599880181
   ✔  - Xuguang Mei (@meixg): https://github.com/nodejs/node/pull/61119#pullrequestreview-3600615411
   ✔  - Juan José Arboleda (@juanarbol): https://github.com/nodejs/node/pull/61119#pullrequestreview-3601719156
   ✔  - Ruben Bridgewater (@BridgeAR) (TSC): https://github.com/nodejs/node/pull/61119#pullrequestreview-3608118750
   ✔  Last GitHub CI successful
   ℹ  Last Full PR CI on 2025-12-20T15:52:57Z: https://ci.nodejs.org/job/node-test-pull-request/70550/
- Querying data for job/node-test-pull-request/70550/
✔  Build data downloaded
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  No git cherry-pick in progress
   ✔  No git am in progress
   ✔  No git rebase in progress
--------------------------------------------------------------------------------
- Bringing origin/main up to date...
From https://github.com/nodejs/node
 * branch                  main       -> FETCH_HEAD
✔  origin/main is now up-to-date
- Downloading patch for 61119
From https://github.com/nodejs/node
 * branch                  refs/pull/61119/merge -> FETCH_HEAD
✔  Fetched commits as 42d0e1307b11..42b4ff4a114e
--------------------------------------------------------------------------------
[main eed8d49a4b] src: improve StringBytes::Encode perf on ASCII
 Author: Nikita Skovoroda <chalkerx@gmail.com>
 Date: Fri Dec 19 01:32:42 2025 +0400
 2 files changed, 25 insertions(+), 7 deletions(-)
[main a8535368ca] src: use validate_ascii_with_errors
 Author: Nikita Skovoroda <chalkerx@gmail.com>
 Date: Fri Dec 19 13:30:57 2025 +0400
 2 files changed, 2 insertions(+), 2 deletions(-)
   ✔  Patches applied
There are 2 commits in the PR. Attempting autorebase.
(node:2249) [DEP0190] DeprecationWarning: Passing args to a child process with shell option true can lead to security vulnerabilities, as the arguments are not escaped, only concatenated.
(Use `node --trace-deprecation ...` to show where the warning was created)
Rebasing (2/4)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
src: improve StringBytes::Encode perf on ASCII

PR-URL: #61119
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Gürgün Dayıoğlu <hey@gurgun.day>
Reviewed-By: Xuguang Mei <meixuguang@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

[detached HEAD 9d82a3ab21] src: improve StringBytes::Encode perf on ASCII
Author: Nikita Skovoroda <chalkerx@gmail.com>
Date: Fri Dec 19 01:32:42 2025 +0400
2 files changed, 25 insertions(+), 7 deletions(-)
Rebasing (3/4)
Rebasing (4/4)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
src: use validate_ascii_with_errors

PR-URL: #61119
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Gürgün Dayıoğlu <hey@gurgun.day>
Reviewed-By: Xuguang Mei <meixuguang@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

[detached HEAD f40004bd9d] src: use validate_ascii_with_errors
Author: Nikita Skovoroda <chalkerx@gmail.com>
Date: Fri Dec 19 13:30:57 2025 +0400
2 files changed, 2 insertions(+), 2 deletions(-)
Successfully rebased and updated refs/heads/main.

ℹ Add commit-queue-squash label to land the PR as one commit, or commit-queue-rebase to land as separate commits.

https://github.com/nodejs/node/actions/runs/20530896393

@aduh95 aduh95 added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Dec 27, 2025
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 27, 2025
@nodejs-github-bot nodejs-github-bot merged commit 36ffbc5 into nodejs:main Dec 27, 2025
90 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 36ffbc5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.